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the sample. Each sample gives rise to a unique spectrofluorometric set of physical parameters. By analysing the fluorescence data, 
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ence/absence of a specific disease, group of diseases or risk of later attaining a specific disease or a body condition, or concentration 
of a specific compound or medicine. 



WO 01/92859 Al I lllll lllll III II llllll III! I llll I II III lllll llll I ill I lllll ill II lllll III I II llll 



Published: For tuo-left oa r abbni ations, refer 10 the "Guid- 

wilh international search report anee \o, "i Co, and 'oh , , ion: ipp a ;a///iebe± 1 

' t / the PCT Gazette. 



WO 01/92859 



PCT/DK01/00383 



Method and system for classifying a biological sample 

The present invention relates to a method of training a classification system for 
characterising a biological sample, a diagnostic classification system, as well as a 
5 method of characterising a condition in an animal or a human being by using pa- 
rameters obtained from the sample. 

Background 

10 A need for a fast and reliable primary diagnostic tool providing information indicative 
of a disease or a group of diseases has existed for years. 

In US 4,755,684 (Leiner et al.) a method for tumor diagnosis by means of serum 
tests is disclosed. The method includes excitation of the serum by an excitation ra- 

15 diation at least of a wavelength between 250 nm and 300 nm, and its fluorescence 
intensity is measured at predetermined emission wavelengths. From deviations of 
these measuring values, a conclusion may be drawn with respect to the presence of 
a neoplastic disease. Measurements at one or two excitation wavelengths are sug- 
gested. Up to three emission wavelengths are determined for each excitation wave- 

20 length and an intensity value is determined. Since very little information from the 
fluorescence spectroscopy is used the diagnosis is very rough and insecure. Only 
about 60 % are diagnosed correctly and the diagnosis is limited to a yes or no. 

In WO 96/30746 and WO 98/24369 fluorescence spectra are used to screen tissue 
25 samples in situ, wherein the tissue suspected to be dysplastic tissue is directly sub- 
jected to fluorescence spectroscopy. The methods are used to distinguish between 
dysplastic cervical tissue and normal cervical tissue. In O'Brien K.M. et al "Deve- 
lopment and evaluation of spectral classification algorithms for fluorescence guided 
laser angioplasty", IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, vol. 
30 36, No. 4, April 1989, pages 424-4430, fluorescence spectroscopy is used to distin- 
guish normal arterial tissue from atherosclerotic tissue. None of these methods al- 
lows a specific diagnosis to be made based on analysis of spectra from tissue or 
bodufluids not directly related to the diseased tissue. 
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In US 5,734,587 a method of analyzing sample liquids by generating infrared spec- 
tra of dried samples and evaluating using a multivariate evaluation procedure is dis- 
closed. In the evaluation procedure the samples are assigned to classes. The 
evaluation procedure is trained with samples of known classes to adjust the pa- 
5 rameters of the evaluation procedures, such that samples of unknown classification 
can be assigned to known classes. The samples analysed are clinically relevant 
liquid samples, that have to be dried before generating the infrared spectra of the 
samples due to the nature of infrared spectra. 

1 0 Most organic compounds absorb light in the visible or ultraviolet part of the electro- 
magnetic spectrum. Many molecules emit the absorbed excitation energy in the form 
of fluorescence. A fluorescence spectrum is obtained by transmitting light to the 
sample (excitation light) and determining the spectral distribution of the light emitted 
from the sample. In the case where only one fluorescent compound is present in a 

1 5 weakly absorbing solution, the spectral profile of the fluorescence will be invariant 
with respect to the excitation wavelength. Only the intensity of the fluorescence will 
vary with the wavelength of the excitation light in accordance with the absorption 
spectrum. 

20 If more than one fluorescent compound is present in the solution the relation be- 
tween excitation and emission intensities will rapidly increase to a very high level of 
complexity. The individual compounds will absorb differently for each excitation 
wavelength, the intensity and distribution of the fluorescence will vary with excitation 
wavelength, and reabsorption of emitted photons might occur. 

25 

When a series of fluorescence spectra using different excitation wavelengths are 
recorded, the spectra collected represents an emission-excitation-matrix (EEM), 
which can be displayed as a 3-dimensional landscape (Figure 1). The EEM is spe- 
cific for the specific mixture of compounds and the conditions under which it is 
30 measured. 

Summary 

It has been an object of the present invention to provide a method capable of clas- 
35 sifying samples with unknown properties in a system not requiring any drying, en- 
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richment, separation or concentration of the sample before determining the class, to 
which the sample belongs. 

This has been possible by subjecting the sample to fluorescence spectroscopy or a 
5 variant thereof, whereby liquid as well as solid samples may be classified. 

Thus, in a first aspect the present invention relates to a method of training a classifi- 
cation system for characterising a biological sample with respect to at least one 
condition, comprising 



10 



a) obtaining a biological sample from an animal, including a human, wherein 
said biological sample is selected from body fluids and/or tissue, wherein 
the tissue sample is not associated with said condition(s), 



15 



b) obtaining characterisation information related to each biological sample, 



c) exposing the sample to excitation light within a predetermined range of 
wavelength, 



20 



d) determining physical parameter(s) of light emitted from the sample, 



e) repeating step a) to d) until the physical parameters of all training samples 
have been determined, 



25 



f) optionally performing a data handling of the obtained physical parameters 
obtaining data variables, 



g) optionally performing a multivariate data analysis of the data variables ob- 
taining model parameters describing the variation of the data variables, 



30 



h) 



classifying the biological samples into at least two different classes corre- 
lated to the characterisation information, obtaining a trained classification 
system. 



30 
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In a preferred embodiment the method comprises the steps of: 

a) obtaining a biological sample from an animal, including a human, wherein 
said biological sample is selected from body fluids and/or tissue, wherein 
the tissue sample is not associated with said condition(s), 

b) obtaining characterisation information related to each biological sample, 

c) exposing the sample to excitation light within a predetermined range of 
wavelength, 

d) determining physical parameters) of light emitted from the sample, 

e) repeating step a) to d) until the physical parameters of all training samples 
have been determined, 

f) performing a data handling of the obtained physical parameters obtaining 
data variables, 

g) optionally performing a multivariate data analysis of the data variables ob- 
taining model parameters describing the variation of the data variables, 



h) classifying the biological samples into at least two different classes corre- 
lated to the characterisation information, obtaining a trained classification 
system. 

In another preferred embodiment the method comprises the steps of: 

a) obtaining a biological sample from an animal, including a human, wherein 
said biological sample is selected from body fluids and/or tissue, wherein 
the tissue sample is not associated with said condition(s), 



35 b) obtaining characterisation information related to each biological sample, 
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c) exposing the sample to excitation light within a predetermined range of 
wavelength, 

d) determining physical parameters) of light emitted from the sample, 

e) repeating step a) to c) until the physical parameters of all training samples 
have been determined, 

f) performing a data handling of the obtained physical parameters obtaining 
data variables, 

g) performing a multivariate data analysis of the data variables obtaining model 
parameters describing the variation of the data variables, 

h) classifying the biological samples into at least two different classes corre- 
lated to the characterisation information, obtaining a trained classification 
system. 

In another aspect the present invention relates to a classification system for char- 
acterising a biological sample, said system comprising: 

a) a sample domain for comprising a biological sample, 

b) light means for exposing the sample to excitation light in the sample domain, 

c) a detecting means recording the physical parameter(s) of light emitted from 
the sample, 

d) optionally computing means for performing data handling of the physical pa- 
rameters, obtaining data variables, 

e) optionally processing means for providing model parameters from data vari- 
ables of the sample, 

f) at least one storage means for storing physical parameters and/or data vari- 
ables and/or model parameters of the biological sample, 
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g) at least one storage means for storing physical parameters and/or data vari- 
ables and/or model parameters and characterisation information of a trained 
classification system, 

5 

h) means for correlating physical parameters and/or data variables and/or 
model parameters from the sample with physical parameters and/or data 
variables and/or model parameters of the trained system, and 

1 0 i) means for displaying the characterisation class(es) of a sample. 

In a preferred embodiment the system comprises: 

a) a sample domain for comprising a biological sample, 

15 

b) light means for exposing the sample to excitation light in the sample domain, 



c) a detecting means recording the physical parameter(s) of light emitted from 
the sample, 

d) computing means for performing data handling of the physical parameters, 
obtaining data variables, 



e) optionally processing means for providing model parameters from data vari- 
ables of the sample, 

f) at least one storage means for storing physical parameters and/or data vari- 
ables and/or model parameters of the biological sample, 



g) at least one storage means for storing physical parameters and/or data vari- 
ables and/or model parameters and characterisation information of a trained 
classification system, 



01/92859 PCT/DK01/00383 
7 

h) means for correlating physical parameters and/or data variables and/or 
model parameters from the sample with physical parameters and/or data 
variables and/or model parameters of the trained system, and 

i) means for displaying the characterisation class(es) of a sample. 
In another preferred embodiment the system comprises: 

a) a sample domain for comprising a biological sample, 

b) light means for exposing the sample to excitation light in the sample domain, 

c) a detecting means recording the physical parameter(s) of light emitted from 
the sample, 

d) computing means for performing data handling of the physical parameters, 
obtaining data variables, 



e) processing means for providing model parameters from data variables of the 
20 sample, 



f) at least one storage means for storing physical parameters and/or data vari- 
ables and/or model parameters of the biological sample, 

25 g) at least one storage means for storing physical parameters and/or data vari- 

ables and/or model parameters and characterisation information of a trained 
classification system, 



h) means for correlating physical parameters and/or data variables and/or 
30 model parameters from the sample with physical parameters and/or data 

variables and/or model parameters of the trained system, and 



i) means for displaying the characterisation class(es) of a sample. 
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In yet another aspect the invention relates to a method for characterising a biological 
sample of an animal, including a human, comprising 

a) obtaining a biological sample from the animal or human, 

b) exposing the sample to excitation light, 

c) determining the physical parameters) of light emitted from the sample, 

d) optionally performing a data handling of the obtained physical parameters 
obtaining data variables, 



e) storing the physical parameters and/or data variables and/or model pa- 
rameters, 

f) optionally providing model parameters from data variables of the sample, 

g) obtaining physical parameters and/or data variables and/or model parame- 
ters from a trained classification system, 



h) correlating physical parameters and/or data variables and/or model pa- 
rameters from the sample with physical parameters and/or data variables 
and/or model parameters of the trained system, and 



i) displaying characterisation class(es) of the sample. 

In yet another aspect the invention relates to a method for characterising a biological 
sample of an animal, including a human, comprising 

a) obtaining a biological sample from the animal or human, 



b) exposing the sample to excitation light, 



c) determining the physical parameter(s) of light emitted from the sample, 
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d) performing a data handling of the obtained physical parameters obtaining 
data variables, 

e) storing the physical parameters and/or data variables and/or model pa- 
rameters, 

f) optionally providing model parameters from data variables of the sample, 

g) obtaining physical parameters and/or data variables and/or model parame- 
ters from a trained classification system, 

h) correlating physical parameters and/or data variables and/or model pa- 
rameters from the sample with physical parameters and/or data variables 
and/or model parameters of the trained system, and 

i) displaying characterisation class(es) of the sample. 

In yet another aspect the invention relates to a method for characterising a biological 
sample of an animal, including a human, comprising 

a) obtaining a biological sample from the animal or human, 

b) exposing the sample to excitation light, 

c) determining the physical parameter(s) of light emitted from the sample, 

d) performing a data handling of the obtained physical parameters obtaining 
data variables, 

e) storing the physical parameters and/or data variables and/or model pa- 
rameters, 

f) providing model parameters from data variables of the sample, 
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g) obtaining physical parameters and/or data variables and/or model parame- 
ters from a trained classification system, 

h) correlating physical parameters and/or data variables and/or model pa- 

5 rameters from the sample with physical parameters and/or data variables 

and/or model parameters of the trained system, and 

i) displaying characterisation class(es) of the sample. 

1 0 Thus, the comparison of the sample and the classification information in the trained 
classification system can be carried out on different levels of data, namely by com- 
paring either the physical parameters and/or the data variables and/or the model 
parameters. It is likewise conceivable that two of the levels of data or all three levels 
can be used in the comparison of the biological sample to the classification informa- 

1 5 tion in the trained classification system. 

According to the first aspect of the invention, namely the method of training a classi- 
fication system, step b) which relates to obtaining classification information related 
to each biological sample can be carried out at any point in time as long as the in- 
20 formation is available for the last step (step h) of the training method. 

According to a preferred embodiment of the three aspects of the invention, the 
model parameters are latent variables being weighted averages of the data vari- 
ables. 

25 

The method is preferably carried out in a classification system trained according to 
the present invention. 

Drawings 

30 

Figure 1. Fluorescence landscape of typical urine sample. Intensity is given as a 
function of excitation and emission wavelength. 

Figure 2. Three-dimensional score plot of latent variable (component) one versus 
35 two versus three. The 18 samples are labelled according to smoker/non-smoker 
(S/N) and person (number). 
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Figure 3. Typical fluorescence excitation-emission landscape from a sample from a 
fasting person 

5 Figure 4. A scatter plot of score one, two and five from a PCA model of 23 fasting 
and non-fasting persons. 

Figure 5. Score scatter plots from a PCA model of data from persons with benign 
tumors. The plots show score 1 versus 2, 1 versus 3 and 2 versus 3. 

10 

Figure 6. A plot of the raw fluorescence data used in the analysis. The 28 spectra 
from each sample are arranged successively on an arbitrary wavelength scale. 

Figure 7. Influence-plot of samples in two latent dimensions (bold) and in three la- 
1 5 tent dimensions (ordinary font). The behaviour of sample five going from poor de- 
scription (high residual) to high impact on the model (high leverage) is indicative of 
an outlying behaviour. This sample is not visible as an outlier in the plot of the raw 
data (Figure 6). 

20 Figure 8. Score-plot showing the samples from the cardiac patients investigation in 
terms of the first principal component versus the second principal component. 

Figure 9. Unfolded variable averaged fluorescence spectra for the 8 samples. The 
first emission top corresponds to excitation at 230 nm and the last emission top cor- 
25 responds to excitation at 500 nm. 

Figure 10A. PC1 vs. PC2 score plot from a PCA on auto scaled data. Figure 10B. 
PC1 score from a PCA on auto scaled data. 

30 

Figure 1 1 A PC1 vs. PC2 score plot from a PCA on mean centered data. Figure 1 1 B 
PC1 score from a PCA on mean centered data. 

Figure 12A Predicted vs. measured for log(concentration) with all bacteria samples 
35 (i.e. without control sample). Figure 12B Predicted vs. measured for 

log(concentration) without the control sample and the sample containing 10 s cells. 
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Figure 13. Upper part: Front face fluorescence spectrum of an undiluted blood 
plasma sample. Lower part: Front face fluorescence spectrum of the same sample 
diluted 1 :5000. Notice the different intensity scales. 

5 

Figure 14A. Excitation 230 nm as a function of the dilution. 2 is diluted 1:5000, 3 is 
1:3000, 4 is 1:2000, 5 is 1:700, 6 is 1:500, 7 is 1:200, 8 is 1:100, 9 is 1:50, 10 is 
1:25, 11 is 1:10, 12 is 1:5, 13 is 1:2, 14 is undiluted sample. Figure 14B Excitation 
250 nm as a function of the dilution. Same dilutions as in Figure 14A. (Measured in 
10 front face mode). 

Figure 15A Excitation 310 nm as a function of the dilution. Same dilutions as in Fig- 
ure 14A. Excitation 360 nm as a function of the dilution. Same dilutions as in Figure 
14A. (Measured in front face mode). 

15 

Figure 16A. PC1 vs. PC2 score plot from a PCA. Figure 16B. PC3 vs. PC4 score 
plot from a PCA 

Figure 17. Upper part: Transmission fluorescence spectrum of an undiluted blood 
20 plasma sample. Lower part: Transmission fluorescence spectrum of the same sam- 
ple diluted 1:5000. Notice the different intensity scales. 
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Figure 18A. Excitation 230 nm as a function of the dilution. 2 is diluted 1:5000, 3 is 
1:3000, 4 is 1:2000, 5 is 1:1000, 6 is 1:700, 7 is 1:500, 8 is 1:200, 9 is 1:100, 10 is 
1:50, 11 is 1:25, 12 is 1:10, 13 is 1:5, 14 is 1:2, 15 is undiluted sample. Figure 18B. 
Excitation 250 nm as a function of the dilution. Same dilutions as in Figure 18A. 
5 (Measured in transmission mode). 

Figure 19A. Excitation 310 nm as a function of the dilution. Same dilutions as in Fig- 
ure 18A. Figure 19B. Excitation 360 nm as a function of the dilution. Same dilutions 
as in Figure 18A. (Measured in transmission mode). 

10 

Figure 20A. PC1 vs. PC2 score plot from a PCA. Figure 20B. PC3 vs. PC4 score 
plot from a PCA. 

Figure 21 A. Predicted vs. measured for front face for 1 :25 to 1 :5000. Figure 21 B 
1 5 Predicted vs. measured for transmission for 1 :25 to 1 :5000. 

Figure 22A. PC1 vs. PC2 score plot from a PCA on transmission samples. A, B, C, 
and D are different buffers with pH values of approx. 8.5-9.0, 7.0-7.5, 5.0-6.5, and 
0.1 M HCI, respectively. Numbers like 1:2 indicate the dilution factor. Figure 22B 
20 PC1 vs. PC2 score plot from a PCA on front face samples. 

Figure 23 PC1 vs. PC2 from a PCA on all samples. F is front face and T is transmis- 
sion 

25 Figure 24. Unfolded fluorescence spectra at four different pH values for transmission 
mode samples diluted 1:10. The difference observed for 0.1 M HCI spectrum seen in 
insert B is not observed for the other dilutions. 



30 Detailed description of the invention 

The invention relates to classification based on physical parameters obtained from 
the luminescence spectroscopy on light emitted from the sample. For practical rea- 
sons most of the discussion in this description relates to fluorescence spectroscopy. 
35 However as described below other physical parameters may be used in the classifi- 
cation. Thus, throughout the description the term fluorescence is used as an 
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equivalent of any luminescence type and is to be interpreted as such, unless disap- 
propriate in specific embodiments. 

Fluorescence spectroscopy is an extremely sensitive tool. The data obtained from a 
5 spectrofluorimetric analysis can be considered a finger-print of the sample. Each 
sample gives rise to a unique spectrofluorometric set of physical parameters, how- 
ever, as is described by the present invention. When analysing the fluorescence 
data, it has become possible to classify samples into two or more classes based on 
the fluorescence spectra, if there is any systematic difference between the samples. 
10 The difference between the samples will mostly not relate to a single component or 
a few components of the sample, but rather to a combination of a wide variety of 
components. This combination exhibits a pattern so complex that it is detectable by 
multivariate analysis only. 

1 5 Thus, according to the evaluation of the fluorescence parameters it is possible to 
obtain more information about a biological sample, than it is when evaluating the 
various chemical components in the sample individually, i.e. it is possible to obtain 
inter-component information. Furthermore, there is no need to know the exact com- 
position of components in the sample, as it is the fluorescence finger-print rather 

20 than the components of the sample that is detected. If so desired, in a specific appli- 
cation, it may be possible to give a chemical characterisation of the information used 
by the classification system. It may even in certain situations be possible to do so 
directly from the mathematical parameters derived from the physical parameters. 

25 For one sample normally more than several thousand data variables are obtained, 
and the amount of data increases by the number of samples used but the number of 
data variables is constant for each sample. In prior art it was common practise to 
discard most of the spectrofluorimetric information and use but a few selective or 
semi-selective physical parameters but the present invention makes use of all avail- 

30 able information. 

By the present invention is has become possible to obtain information regarding an 
animal or a human being by subjecting a biological sample from said animal (human 
being) to a fluorescence analysis. Examples of the information provided by the pre- 
35 sent invention may be any information regarding health condition, such as informa- 
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tion regarding presence/absence of a specific disease, group of diseases or risk of 
later attaining a specific disease or a body condition, or concentration of a specific 
compound or medicine. 

5 In a first aspect the invention relates to a method of training a classification system 
for characterising a biological sample. It is the purpose of the training that a classifi- 
cation system is obtained, said system holding enough information to be used for 
characterising an un-classified and unknown biological sample into one of the 
classes of the classification system. By the term unknown is meant a sample for 
10 which no characterisation information is known. 

It is also the purpose of the training of the system that this training incorporates a 
validation that substantiates how well classification can be performed on specific 
samples in the future as well as improving the validation specificity and sensitivity 
15 overtime. 

Samples 

The biological sample may be any sample suitable for fluorescence analysis. The 
20 sample may be fluid or solid, as is appropriate. It is an object of the present inven- 
tion to acquire the necessary information from the sample using as few pre-treat- 
ments as possible, preferably without any pre-treatments as such. 

Accordingly, in a most preferred embodiment the sample is transferred directly from 
25 the animal or human being to be subjected to fluorescence analysis, in order to ob- 
tain data relating to fresh, un-treated samples. In case it is not possible to use the 
biological sample directly, it may be stored, for example by freezing the sample. 

A characteristic of the biological sample is that it is preferably not directly related to 
30 the specific conditions, in that the spectroscopy is preferably not conducted on the 
tissue suspected to express the disease, whereby it is often possible to diagnose a 
condition or a disease in a easy manner, since the biological sample to be examined 
may be easily established, such as a urine sample or a plasma sample. 
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Fluid samples may be any fluid samples obtainable from animals or human beings, 
i.e. body fluids, such as biological samples selected from blood, plasma, serum, 
saliva, urine, cerebrospinal fluid, tears, nasal secrete, semen, bile, lymph, milk, 
sweat and/or faeces. 

5 

In a preferred embodiment the fluids are easily available fluids, such as urine sam- 
ples, milk, blood and/or serum and/or plasma samples. Most preferred are urine, 
milk or saliva samples or any other samples that are obtainable without any invasive 
technique. 

10 

The fluid sample is subjected to fluorescence analysis without drying, and preferably 
without any other changes in concentration, such as separation and enrichment. 
The fluid sample may be arranged in a sample compartment being closed or open 
before exposing the sample to excitation light. 

15 

It is however also possible with the present invention to use tissue samples, such as 
solid tissue samples directly. Examples of tissue samples include hair and nails. The 
tissue sample may be any sample, such as a biopsy of tissue, that is subjected to 
fluorescence spectroscopy. In the present invention the tissue is not directly related 
20 to the specific condition(s), thus the term "the tissue sample is not associated with 
said condition(s)" means that for example when classifying with respect to cancer 
the tissue sample does not represent the possibly cancerous tissue, but tissue from 
another part of the individual. 

25 The biopsy may be from any tissue, such as from muscle, cutis, subcutis, kidney, 
brain, and liver. 

The solid samples may be classified on the solid form, but it may often be necessary 
to provide a liquid form of the tissue before subjecting the sample to fluorescence 
30 spectroscopy. The liquid form of the tissue may be obtained by dissolving the tissue 
or mechanically destroying the tissue, such as blending the tissue, to obtain a suit- 
able liquid suspension of the tissue. 

Furthermore, it is possible to use a sample positioned in situ or non-invasive, i.e. 
35 not removed from its normal environment. The sample does not need to be physi- 
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cally removed from its place in the animal or human being. The invention also en- 
compasses the possibility of in situ analysis of samples. This can be done easily 
with samples like skin, hair, and nails, but it is likewise possible to conduct the exci- 
tation and fluorescence light beams by means of light guides to and from the liquid 
5 or tissue samples within the body. This may be accomplished by conducting the 

measurements transdermal^. The light guides may thus be introduced into the body 
via body openings, such as the mouth, nose, ears, rectum, vagina, or urethra, or the 
light guides may be introduced through the blood vessels or inserted directly into 
tissue. In this way various fluids may be measured in situ, as well as some of the 
10 solid tissue samples. 

In a preferred embodiment the biological sample is selected from body fluids, hair 
and nails, more preferred from body fluids. 

Excitation light 

15 

The physical parameters may in principle be obtained for a wide variety of excitation 
light wavelengths. The wavelengths are preferably selected to be within the range of 
from 100 nm to 1000 nm, such as from 100 to 800 nm, more preferably within the 
range of from 200 nm to 800 nm, such as from 200 nm to 600 nm. 

20 

Normally several wavelengths are used, such as from 2 to 10.000, 4 to 10.000, 2 to 
1000, 4 to 1000, 2 to 100 wavelengths, such as from 4 to 100 wavelengths, for in- 
stance 2-30, such as from 4 to 30 wavelengths, such as 2-10, such as from 4 to 10 
wavelengths, for instance 2-6 wavelengths in order to describe an excitation- 

25 emission matrix optimally. Sets of wavelength may be chosen so that each wave- 
length differs from the other by at least 0.1 nm, such as at least 0.5 nm, for instance 
at least at least 1 nm, such as at Ieast 5 nm, for instance at least 10 nm, such as at 
least 50 nm, for instance at least at least 100 nm, such as at least 150 nm, for in- 
stance at least 250 nm, such as at least 500 nm, for instance at least 600 nm, such 

30 as at least 700 nm, and at most 750 nm. 

Multiwavelength excitation may be established either sequentially by varying the 
setting of a monochromator or other dispersing or filtering device in front of a con- 
tinuous lightsource like a xenon lamp. Alternatively, the sample may be exposed to 
35 the full spectrum of a continuous light source equipped with a polychromator which 
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disperses the light spatially. Thus, different zones of the sample are exposed to ex- 
citing light of different wavelengths. Furthermore, an array of single wavelength light 
sources light e.g. lasers or light guide bundles may be used either in the sequential 
mode or in the spatially separated mode. 

5 

Accordingly, at least 2 excitation light wavelengths are selected such as at least 4, 
at least 6, at least 8, at least 10, or more. The excitation light of each wavelength 
may be used simultaneously or sequentially. In a preferred embodiment 4 wave- 
lengths are selected, such as excitation light having a wavelength of 230 nm, 240 
10 nm, 290 nm, and 340 nm. Each sample is then subjected to excitation light of each 
wavelength. In another preferred embodiment 6 wavelengths are selected. 

The predetermined excitation light wavelength(s) is provided by use of light sources 
as is known to a person skilled in fluorescence spectroscopy. 

15 

Emission 

The determination of the various physical parameters is done by equipment known 
to the person skilled in the art. 

20 

In fluorescence spectroscopy emission light intensities at different wavelengths are 
recorded for each excitation light wavelength. Preferably the emission light is sam- 
pled with 0.5 nm intervals or 1 nm intervals. Thereby a matrix of excitation-emission 
data is obtainable for each sample. Normally the spectral distribution of light emitted 
25 from the sample is ranging from 200 nm to 800 nm. 

The emitted light is detected by any suitable detector, such as a one-dimensional 
detector, for example a photomultiplier. Alternatively, a scanning camera, a diode 
array, a CCD or a CMOS, all in principle being viewed as a two-dimensional array of 
30 several thousand or more detectors. The intensity of the light is detected on each 
detector thus permitting the whole spectrum or the whole EEM to be obtained in a 
single electronic measurement. 
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The emitted light from the samples may be focused onto the detectors by means of 
conventional focusing systems, as well as passing through diaphragms and mirrors. 

Physical parameters 

5 

Most frequently the physical parameter to be determined in order to perform a data 
analysis is the intensity as a function of excitation wavelength and/or the emission 
wavelength. However, any other information contained in photoluminescence may 
be obtained from the sample such as fluorescence lifetime, phosphorescence inten- 
10 sity, phosphorescence lifetime, polarisation, polarisation lifetime, anisotropy, anisot- 
ropy lifetime, phase-resolved emission, circularly polarised fluorescence, fluores- 
cence-detected circular dichroism, and any time dependence of the two last men- 
tioned parameters. 

15 Fluorescence intensity is easily measured at room temperature, and may therefore 
be chosen for many of the samples. Furthermore, a great number of organic natural 
products are known to be fluorescent. Phosphorescence may, however, also be 
performed at room temperature. 

20 Luminescence lifetime in general, as well as phosphorescence lifetime are defined 
as the time required for the emission intensity to drop to 1/e of its initial value. 

When using phase resolved fluorescence spectroscopy it is possible to suppress 
Raman and scattered light, leading to very good results for multicomponent sys- 
25 terns. 

In luminescence polarisation measurements, conventional spectra are obtained by 
scanning excitation spectra and measuring intensity parallel and perpendicular to 
the polarisation of exciting light. The polarisation may be calculated as the ratio of 
30 the difference of the two measurements to the sum of the two measurements. The 
anisotropy parameter is obtained by multiplying the perpendicular intensity by two in 
the denominator sum of this ratio. 
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Processing 

The detectors are preferably coupled to a computer for further processing of the 
data. The physical parameters measured or determined by the detector are proc- 
5 essed to a form suitable for the further mathematical calculations. This is done by 
allocating data variables to each physical parameter determined, thus obtaining data 
variables related to the physical parameters, 

The physical parameters determined are often subjected to a data analysis through 
10 the data variables, such as a one-way matrix of spectral information, a two-way ma- 
trix of spectral information, a three-way matrix of spectral information, a four-way 
matrix of spectral information or, a five-way or higher-order matrices of spectral in- 
formation. 

15 Characterisation information 

To obtain information relating to a specific condition in the animal or human being it 
is of importance that the data relating to the spectra obtained are correlated to char- 
acterisation information regarding the same biological sample. The information re- 
20 garding the biological sample is preferably obtained substantially simultaneously 

with the biological sample, however, for characterisation data not varying essentially 
it is sufficient to obtain the data after having obtained the physical parameters. 

The characterisation information relates to the classes intended to be generated 
25 through the training period. 

The characterisation information is for instance relating to the presence or absence 
of a physical condition, such as a specific disease, or information regarding smok- 
ing, drinking, abuse of drugs, nutritive condition, etc. Furthermore, the characterisa- 
30 tion information may include information such as sex, race, age or the like that is 

relevant for the classification. The characterisation information may also be informa- 
tion regarding responsiveness to a treatment as well as information regarding side 
effects of a treatment. The characterisation information may give information of both 
qualitative and quantitative information. 

35 
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Also predictive information regarding an individual's risk of acquiring.a condition or 
disease may be obtained by the present invention. The training of the system may 
be conducted by subjecting a kohorte of individuals to successive sample analysis 
and classify the samples into groups of individuals acquiring the disease and groups 
5 staying healthy during the period of sampling. 

The characterisation information must be correlated to the spectral information ob- 
tained from the sample, in order to obtain the trained system ready for testing un- 
known samples. 

10 

Validity 

Each sample is subjected to fluorescence spectroscopy before the data analysis is 
performed in the training of the classification system. The sample may be one sam- 
15 pie from each animal or human being, or several samples from the same individual, 
each sample obtained at a different time interval or from different fluids or from dif- 
ferent instruments. 

Depending on the classes to be identified when training the classification system it is 
20 of importance to train the system with a sufficient number of samples. The determi- 
nation of the sufficient number of samples is primarily determined by the similarity of 
the fingerprints of different classes. Indirectly, this can often be related to the num- 
ber of expected latent variables, wherein the latent variables are weighted averages 
of the data variables. It is preferred that the ratio of number of training samples to 
25 the expected number of latent variables is at least 5:1 , preferably at least 10:1 . More 
preferred the ratio is 50:1, and even more preferred 100:1. The more training sam- 
ples, the more reliable a system. Training is a continual improvement of the system 
and any sample is also a training sample being weighted decreasingly, however, 
over time. 

30 

The samples being classified in each class are preferably a representative group of 
samples to allow the most reliable classification, wherein representative is meant to 
mean exhibiting all variations influencing said classification. These variables can for 
example be age, sex, medication, existing disease, and race to match the popula- 
35 tion for which the classification system is designed. 



WO 01/92859 



22 



PCT/DK01/00383 



Mathematics 

A central aspect of the invention is the performance of a multivariate analysis, 
5 whereby the data variables relating to the physical parameters are evaluated and 
model parameters are obtained. The model parameters describe the variation of the 
data variables. Thereby the samples are classified uniquely into classes. The identi- 
fication of the classes is obtained when each sample is correlated to the characteri- 
sation information relating to said sample. Correlation in this respect is not neces- 
10 sarily a mathematical correlation. Correlation in this respect may also comprise the 
possibility of performing a comparison of data or fluorescence spectra. 

Preferably, the model parameters are latent variables being weighted averages of 
the data variables. 

15 

The identification of the belonging to a class is obtained when the data variables of a 
sample are input to a trained classification system yielding either qualitative and/or 
quantitative information as to whether a sample belongs to a class. 

20 In performing the data analysis it is often an advantage that the characterisation 

information is already available. Thereby it becomes possible to detect exactly those 
structures in the data that are relevant for detecting the difference between the 
classes and not just structures that may not be relevant for the classification. 

25 The multivariate statistical methods suitable for the present invention are for exam- 
ple represented by chemometric methods like principal component analysis (PCA), 
partial least squares regression (PLS), soft independent modelling of class analogy 
(SIMCA) and principal variables (PV). 

30 A non-exclusive list of other multivariate statistical methods include: Principal com- 
ponent analysis 14 , principal component regression 14 , factor analysis 2 , partial least 
squares 14 , fuzzy clustering 16 , artificial neural networks 6 , parallel factor analysis 4 , 
Tucker models 13 , generalized rank annihilation method 9 , locally weighted regres- 
sion 15 , ridge regression 3 , total least squares 10 , principal covariates regression 7 , Ko- 

35 honen networks 12 , linear or quadratic discriminant analysis 11 , k-nearest neighbors 
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based on rank-reduced distances 1 , multilinear regression methods 5 , soft independ- 
ent modeling of class analogies 8 , robustified versions of the above and/or obvious 
non-linear versions such as one obtained by allowing for interactions or crosspro- 
ducts of variables, exponential transformations etc. 

5 

The term "describing the variation of the data variables" means that the latent vari- 
ables retain the relevant information regarding the variation, whereas "noise" is pref- 
erably not giving any significant part in the latent variables. 

10 As an example of a multivariate data analysis technique, the use of principal com- 
ponent analysis - PCA - will be outlined [Jackson 1991] as this technique will be 
used in the following exemplary applications. An IxJ data matrix, Xe^, is given 
where / is the number of rows (samples) and J is the number of columns. The num- 
ber of variables will typically exceed the number of samples by far. This poses the 

1 5 practical problem that the matrix is typically ill-conditioned. Thus, any traditional 

analysis using the whole set of raw data will lead to useless results due the numeri- 
cal problems involved in handling the large amount of data 

Using PCA, the original J variables are replaced by F latent variables which, in 
20 this case, are also called principal components. These latent variables are found as 
weighted averages of the original variables in such a way that they provide the best 
possible description of the data in. a least squares sense. Each latent variable con- 
sists of a score vector t (/x1) and a loading vector p (Jx1). The loading vector is 
constrained to norm one and the score vector is found by regressing X onto p 

25 

t = Xp/p T p = Xp 

For the first latent variable it holds that it minimizes 

i*-«pi; 

where \\»f F denotes the squared Frobenius norm. Thus, the first latent variable pro- 
30 vides the least squares best-fitting rank-one model of X. The second latent variable 
is found under the constraint that the second score vector t 2 is orthogonal to the first 
score t, and that second loading vector p z is orthogonal to the first loading pj. Under 
this restriction, the second latent variable is found such that it provides the best pos- 
sible fit to the data. Extracting F such components will yield a rank F model of the 
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data. Let the score matrix T (IxF) hold the score vectors t f , f=1,..,F and the loading 
matrix P (JxF) holds the loading vectors p f , M ,..,F of this solution. It then holds that 
T and P provide the solution to 

argjiin|X-GR T |; 

5 and thus provide the best-fitting rank F solution. In practice, the solution to this 
problem can be found using a truncated singular value decomposition of X. If U F 
holds the first F left singular vectors, V F holds the first F right singular vectors and S F 
is an FxF diagonal-matrix holding the first F singular values in its diagonal, then it 
holds thai 

10 

T = U F S F ; P = V F . 

In order to choose the appropriate number of components, F, several strategies are 
possible. One approach is to use cross-validation [Wold 1978] in which elements are 
left out of the data in turn. For each set of elements left out, a model is fitted to the 
1 5 remaining data and the model 

X=TP T 

is used to estimate the left out elements. After all elements have been left out once, 
the thus obtained residuals are used for calculating the predicted residual sum of 

20 squares (PRESS) and the number F for which PRESS is at its minimum is usually 
taken to be the appropriate number of components. For exploratory purposes, it is 
usually sufficient to simply retain the first 2-5 components because these, per defini- 
tion, retain most of the variation in X. It is noted that if cross-validation is to be per- 
formed in this way, special algorithms have to be used because of the missing val- 

25 ues in the data [Grung & Manne 1 998]. 

The practical usefulness of PCA arises because of the information preserving com- 
pression of the data based on the empirical observations rather than on theoretical 
derivations. The scores T can be seen as the coordinates of X in the reduced space 
30 defined by the truncated basis P and the latent variables therefore provides a con- 
densation of the original J variables into F new ones. This condensed representation 
is feasible because it allows a holistic visualization of the structure in the data and 
because it makes it possible to do quantitative analysis such as regression and 
classification in a straightforward way. 
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In some situations, the interest is to specifically make a quantitative model relating 
multivariate data to one or more responses by a regression model. This way, it is 
possible to measure future samples by the multivariate approach and then predict 
5 the response from the regression model. Such a regression problem suffers from 
the same problems as outlined above and for the same reason rank-reduced re- 
gression is often employed. As an example of such, the partial least squares regres- 
sion - PLS - method will be described. 



1 0 As before a multivariate set of data X (IxJ) is available and further a response vector 
y (/x1) is given. More responses can be handled as well, but this is not pursued 
here. The aim is to find a regression vector b that provides a feasible solution to the 
regression problem 

15 y = Xb + e 



where e (/x1) is a vector of unmodelled residuals. Using multiple linear regression 
(J<l) or similar approaches, it is possible to obtain a minimum variance unbiased 
estimate of b but due to the constraint of being unbiased, the variance will, in prac- 

20 tice, make the estimate useless for predictions in the situations considered here [de 
Jong 1995, Martens & Naas 1987, Wold et al. 1984]. Instead, a regression vector is 
sought that yields a low mean squared error, hence relaxing the restriction of being 
unbiased focusing on low total error. In PLS, this is achieved by extracting compo- 
nents sequentially such that each extracted score vector has maximal covariance 

25 with the yet unexplained variation in the response [Bro 1996, Martens & Naes 1989]. 
Usually X and y are centered by subtracting the column average from each column, 
thereby removing possible offsets. For centered X and y, the first component is de- 
termined by defining a weight vector as 



From this vector, the score vector t is defined as 
t = Xw 

and finally a loading vector p is defined as 

p = xWt. 
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The rank-one model of X is then given by tp T and the regression model relating the 
bilinear model of X to y is defined by the scalar 

r = y T t/t T t. 

5 giving the initial prediction of y as tr. The model of X (tp T ) and the prediction of y (tr) 
are subtracted from X and y and the following component is determined similar to 
above but using the residuals of X and y as input. After calculating F components in 
this manner, the following matrices and vectors are given: T (UF), P (JxF), W {JxF), 
r (Fx1). The regression vector can then be determined as 

10 

b = W(P T W) 1 r. 

As for PCA, cross-validation is usually employed to determine the optimal rank of 
the model with the only difference being that whole samples are excluded in each 
1 5 cross-validation segment, and the residual error determined is the response error. 

Other variables 

The classification system may be obtained on the spectral information only. How- 
20 ever, in some situations it may be appropriate to incorporate other variable(s) in the 
multivariate analysis. 

These other variables may be variables relating to the sample supplying the spectral 
information or they may be variables that compensate for a specific condition of the 
25 sample. 

Examples hereof may be the measurement of pH, electrolytes, temperature in the 
sample before subjecting it to spectroscopy, clinical parameters. Thereby variations 
in the other variables may be compensated for in the final classification. 

30 

In another embodiment other variables are variables relating to the animal, including 
a human being, to be characterised. Non-exclusive examples of these variables are 
hair colour, skin colour, age, sex, geographic origin, affiliation, prior diseases, he- 
reditary background, medication intake, body conditions (such as e.g. surgery), 
35 stress level, medical diagnoses, subjective evaluations, and other diagnostic tests 



WO 01/92859 



27 



PCT/DK01/00383 



(e.g. immunoassays, x-ray diagnosis, genomic information or an earlier che- 
mometric test). 

Pre-treatment 

5 

It is an advantage of the present invention that no pre-treatment of the sample is 
normally necessary. 

However, for some of the samples or applications it may be necessary or convenient 
10 to perform an adjustment before subjecting the sample to spectroscopy. 

Examples of pre-treatment may be adjustment of pH of the sample to a predeter- 
mined value, or heating or cooling the sample to a predetermined temperature. The 
sample may be treated with chemicals (complexing agents etc.) in order to develop 
15 e.g. fluorescent complexes involving inherent non-fluorescent molecules in the sam- 
ple. 

Other types of pre-treatment include addition of chemical substances, measurement 
under a gradient imposed by varying additions of chemical substances, simple 
20 chromatographic pre-treatments based on either chemical or physical separation 
principles. 

Classification system 

25 Another aspect of the present invention is the classification system for characteris- 
ing a biological sample into at least one predetermined class. 

When the classification system has been trained as discussed above, it is ready for 
classifying samples with unknown characteristics. The classification system prefera- 
30 bly comprises the following components: 

a) a sample domain for comprising a biological sample, 

b) light means for exposing the sample to excitation light in the sample domain, 

35 
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c) a detecting means recording the physical parameter(s) of light. emitted from 
the sample, 

d) optionally computing means for performing data handling of the physical pa- 
5 rameters, obtaining data variables, 

e) optionally processing means for providing model parameters from data vari- 
ables of the sample, 

10 f) at least one storage means for storing physical parameters and/or data vari- 

ables and/or model parameters of the biological sample, 

g) at least one storage means for storing physical parameters and/or data vari- 
ables and/or model parameters and characterisation information of a trained 

15 classification system, 

h) means for correlating physical parameters and/or data variables and/or 
model parameters from the sample with physical parameters and/or data 
variables and/or model parameters of the trained system, and 

20 

i) means for displaying the characterisation class(es) of a sample. 

The sample domain may be a sample chamber for accommodating a container with 
a liquid, a solid or a semi-solid sample. However, the sample domain may also be a 
25 domain in the individual to be classified in that the analysis can be performed on a 
sample in situ such as in the blood vessels or on the superficial body parts such as 
skin, nails, or hair. 

The classification system may be provided as a whole unit, wherein the spectros- 
30 copy of the sample is conducted by the same unit from where the data relating to 
the characterisation classes of the sample is displayed. 



35 



It is however contemplated within the scope of the present invention, that the system 
is comprised of at least two units, wherein one unit is performing the steps a) to f), 
and another unit is performing the steps g) to i). Other units comprising other parts 
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of the system are also contemplated, such as one unit performing the steps a) to d) 
and storage means for storing physical parameters and/or data variables from f) or 
a) to g), and the other unit comprising the rest of the parts. Yet another unit com- 
prises steps a) to c) in one unit and the remaining steps in the other unit. 

5 

By the system thus divided into at least two units, it is possible to obtain the spectro- 
scopic information from a wide variety of decentral locations and perform the proc- 
essing centrally. The data or the classification system may then be transmitted by 
any suitable means, such as conventional data transmission lines, for example the 
10 telephone lines, or via internet or intranet connections. 

This facilitates the use of the classification system since any physician may provide 
the biological sample, have it subjected to spectroscopic analysis at his or her clinic 
and have the data correlated decentrally without the need of being capable of con- 
1 5 ducting this processing. The physician may then call the central unit to request the 
correlation and classification of the sample data. Depending on the transmission 
mode and equipment, the result may be displayed on a screen or printed on paper, 
or informed by telephone. 

20 In addition to the result, other information may be provided, such as information re- 
garding sample errors, for example the test requires a urine sample not a serum 
sample, information about the statistics, such as fuzziness, the degree of member- 
ship of a group, power and significance. 

25 Diagnosis 

In principle the classification system trained according to the invention may be used 
to characterise any biological sample with respect to any kind of information.. 

30 Interesting parts of the present invention relate to the possibilities of diagnosing a 
condition or the risk of acquiring a condition, such as a physical condition in an ani- 
mal or a human being, from a spectrofluorimetric analysis of a biological sample 
from said animal or human being and relating the spectroscopic data with data in the 
classification system. 



WO 01/92859 



30 



PCT/DK01/00383 



As for any other diagnostic tool, the present invention provides a diagnostic tool, that 
may give a strong indication of a disease or condition or a risk of such disease or 
condition, but for many of the diagnosis these may have to be confirmed by more 
specific diagnostic methods, more precisely directed to the specific diagnostic area. 
5 However, due to the simplicity of the present invention, the precise diagnosis may 
be obtained faster and much more cost-effective than by hitherto known methods. 

The disease detected may be any disease that provides combination of components 
in the biological sample that is detectable as a pattern by the fluorescence spectros- 
10 copy. 

Thus the disease may be selected from any official disease classification system, 
such as ICD-9/10 (WHO's official international classification list), ICIDH-2 (Intern- 
tional Classification of Functioning and Disability) but not limited to those two. Such 
15 classification system includes at least the following groups (the numbers in brackets 
refer to the ICD 9 list) of human diseases as well as similar diseases related to other 
animals: 

Infectious and parasitic diseases (001-139) 
20 Neoplasms (140-239) 

Endocrine, nutritional and metabolic diseases, and immunity disorders (240-279) 
Diseases of the blood and blood-forming organs (280-289) 
Mental disorders (290-319) 

Diseases of the nervous system and sense organs (320-389) 
25 Diseases of the circulatory system (390-459) 

Diseases of the respiratory system (460-519) 

Diseases of the digestive system (520-579) 

Diseases of the genitourinary system (580-629) 

Complications of pregnancy, childbirth, and the puerperium (630-677) 
30 . Diseases of the skin and subcutaneous tissue (680-709) 

Diseases of the musculoskeletal system and connective tissue (710-739) 

Congenital anomalies (740-759) 

Certain conditions originating in the perinatal period (760-779) 
Injury and poisoning (800-999) 

35 

The sample may be classified to belong to a class for any of the diseases above, 
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quickly leading the examining physician to the most likely diagnosis. The sample 
classification may be confirmatory or it may have to be confirmed by more specific 
diagnostic tools. 

5 Furthermore, the sample may be classified into more than one class, whereby a 
more refined diagnostic tool is provided. 

For many of the diseases mentioned above, it is of utmost importance that an early 
diagnosis is obtained, but many of these diseases may be difficult to diagnose con- 
ventionally at the early stage due to very discrete symptoms. By the present method 
it is possible to get a clear indication of the disease at an early stage. 

Furthermore, by the present invention it may furthermore be possible to reveal indi- 
viduals susceptible to a specific disease, due to the classification of relevant biologi- 
cal samples from these individuals. 

In particular in respect of cancer, the invention may be used to classify different 
forms of cancer, including cancer in various organs. Thus, the invention may be 
used to diagnose renal cancer from colon cancer for example. Futhermore, different 
stages of a cancer may be diagnosed, including precancerous stages. Furthermore 
different cancer aggressivity may be diagnosed for example the invention may iden- 
tify high-risk cancer patients independently of whether the underlying cancer is 
anatomically localised in for example breast, lung or colon. 

It is likewise conceivable that the present invention can be used for screening of 
individuals to identify those suffering from a particular disease or those being sus- 
ceptible to a disease or those expected to suffer from the disease in the near future. 

Also, the present invention may reveal individuals at risk due to environmental haz- 
ards, job environment or the like. 

The present invention may also be used to diagnose a variety of abuse of medicine 
and/or narcotics, for example in relation to control, or in un-conscious or semi- 
conscious individuals that have to be treated for their abuse. 
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Another aspect of the invention may be to detect physical and/or psychological 
stress in an individual, and thereby detect persons at risk of acquiring stress related 
diseases. 

5 Yet another aspect of the invention may be the detection of genetic modifications or 
inherited risk by examining a biological sample by fluorescence spectroscopy. 

Yet another aspect of the invention may be to provide a quantitative answer to the 
degree of any of the above-mentioned situations (e.g. the amount of medicine, the 
10 degree of risk etc.). 

In a further embodiment the invention may be used as a tool for predicting the re- 
sponsiveness to a specific treatment for an individual suffering from a particular di- 
sease. For example the invention may be used to classify individuals suffering from 
1 5 cancer into classes of predicted responsiveness to chemotherapy, radiations and/or 
operation. Another example may be to predict the useful medication of depressive 
individuals. 

Examples 

20 

Example 1 

Example of the use of multidimensional sensorial fluorescence data analysis 

25 Smokers/Non-smokers data 

This example illustrates the usefulness of training a classification system on a small 
experimental data set. The data set comprises 18 samples taken from 14 individu- 
als. 

Sampling and measurements 

30 Urine was collected from several male persons and measured spectrofluorimetri- 
cally. Approximately half of the testees were smokers, the other half non-smokers. 
Some samples were measured the same day, others up to 4 days after sampling. 
The samples were kept at -4°C until measurement was performed. The urine was 
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not diluted before measurement. Fluorescence spectra were recorded on a Perkin- 
Eimer LS-50B spectrofluorimeter using front face illumination. 

Scanning was performed from excitation wavelengths 230 nm to 500 nm and from 
5 emission wavelengths 268 nm to 900 nm. 

Data handling 

For each thus obtained fluorescence landscape, the obvious non-bilinear 
parts (emission below excitation and zero- and first-order Rayleigh scatter) were 

10 removed and the corresponding elements denoted 'missing'. This lead to data of the 
type shown in Figure 1. 

For each sample, /, a matrix X, is thus obtained of size JxK with J being the 
number of emission wavelengths and K being the number of excitation wavelengths. 
The whole data set is arranged in a three-way tensorial structure with typical ele- 

1 5 ments x ykl / = 1 , .., /, j = 1 , .., J, k = 1 , .., K. This three-way structure may, geometri- 
cally, be interpreted as a box of data, where each horizontal slice corresponds to a 
specific sample, each vertical slice corresponds to a specific emission wavelength, 
and each frontal slice corresponds to a specific excitation wavelength. 

In the following, this three-way tensorial array is matricized, i.e. rearranged 

20 into a two-way matrix called Z where each row corresponds to a sample, and holds 
all combinations of excitation and emission. In this setup - interpreted as in ordinary 
multivariate data analysis - there are 4003 variables (plus a fraction that is removed 
because it contains either variables which are set to missing or variables with ex- 
tremely small variance). 

25 Data modelling 

This data matrix is subjected (Matlab, version 5.2) to principal component analysis, 
in which the 4003 data variables are replaced with three latent variables. These la- 
tent variables are weighted averages of the original data variables, defined so that 
the projection of Z onto the space spanned by these, retain as much variation as 
30 possible. That is, the latent variables T of size 18x3 are defined through a set of 
weights, P (4003x3) as the solution to 



max||ZP(P T P)-'P T |£ 
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where denotes the Frobenius norm. Because the weight matrix is chosen to be 
orthogonal (without loss of generality), this expression can be reduced to 

5 max||zPP T |£ 

From this expression, the latent variables T are found as the coordinates of Z in the 
reduced space defined by the truncated basis P 

10 T = ZP. 

Note, that the latent variables are found using no information about the status of the 
persons (smoker or non-smoker). The latent variables provides a condensed picture 
of the original 4003 variables, and a simple graphical representation of these is suf- 
1 5 ficient to illustrate the power of this compression (Figure 2). 

As can be seen in the plot, all non-smokers fall above the dashed line and all but 
one smoker falls below the dashed line. Disregarding for now the one outstanding 
sample (S3), it is easily seen that this simple plot provides a powerful tool for as- 

20 sessing whether a person is a smoker or not. Simply by measuring the fluorescence 
excitation-emission data from a urine sample, and projecting the obtained data onto 
the factor weights found above, a set of scorings on the three latent variables is ob- 
tained for a new person. When these are plotted in the above plot, the position of 
the sample above or below the dashed line enables an assessment of whether the 

25 person is a smoker or not. In fact, the sample of the person identified as S2 in the 
lower right part, was left of the initial analysis and only positioned after the plot was 
generated. As can be seen the position is correctly below the dashed line, indicating 
that the person is a smoker. This graphical assessment can be automated in a 
number of ways using appropriate pattern recognition techniques. 

30 

The one. smoker-sample located above the dashed line indicates an erroneous con- 
dition. However, in this initial feasibility study, no detailed information on the individ- 
ual persons were available nor of their smoking habits. Hence, there can be numer- 
ous reasons for this particular position, such as the person had not been smoking 
35 that particular day and the day before etc. 
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Example 2 

Detecting fasting condition 

5 

A patient undergoing a surgical procedure in general anesthesia is exposed to a 
relatively high risk if he or she has being eating or drinking before becoming intu- 
bated. Under a planed procedure the patients have been instructed not to indigest 
before the surgical procedure. However, the patients (especially children or elderly 
10 persons) will not always refrain from indigesting. In both the planed and acute pro- 
cedure it is of help for the physician to know whether a patient has been eating or 
not. Thus, such a test will be a feasible 'add-on' to other tests with low marginal 
cost. 

Samples and solutions 

15 This study includes 9 normal persons fasting and 14 normal persons not fasting. 
The conditions for all 23 persons were identical except for the question of whether 
the persons had been fasting or not. 

For each person a blood sample was taken and blood plasma therefrom frozen. The 
20 blood plasma samples were defrosted and measured at room temperature. The 
samples were measured undiluted front face in a 1 mm cuvette on a Perkin Elmer 
LS50B (Copenhagen University). The excitation wavelength interval range was 230- 
400 nm (10 nm steps) and the excitation and emission slits were 4 nm and 3 nm, 
respectively. The scan rate was 1000 nm/min. In all, a total of 3834 variables were 
25 measured for each sample. At each excitation wavelength, the emission was re- 
moved in the range from 250 to 22 nm above the excitation wavelength in order to 
remove Rayleigh scatter and other irrelevant phenomena [Bro 1998, Bro 1999]. 
Upon removal of these variables, a total of 2020 variables were retained. A typical 
landscape is shown in Figure 3. 

30 Results 

The data were fitted by a PCA model. The model indicated that at least up to six 
components contained valid information. For the present purpose, it means that the 
main systematic part of the fluorescence variation is retained in these six new vari- 
ables. A resulting score plot is shown below (Figure 4). It is a three-dimensional 
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scatter plot of score one, two and five. Each plot represents the relative position of 
one person with respect to that persons fluorescence fingerprint. It is immediately 
seen in the plot that all fasting persons appear to the right in the plot and all non- 
fasting persons appear to the left. 

5 

The significance of this particular plot can be described as follows. In estimating the 
six components in the PCA model, no use whatsoever has been made of the fast- 
ing-information. The PCA model is only based on the fluorescence data. The empiri- 
cal observation that it is possible to assign areas of the plot to only fasting persons 

10 and areas to only non-fasting persons means that a discrimination between the two 
groups has been achieved. Thus, for a person where it is unknown whether the per- 
son is fasting, it is possible to measure a corresponding fluorescence landscape 
under similar conditions and thereby obtain the scores for that particular person. 
Inserting these scores in the plot above, it is then possible to evaluate or verify 

1 5 whether the person is fasting or not by simply monitoring to which side of the indi- 
cated line the point is positioned. 

More elaborate decision rules can easily be envisioned using e.g. linear discriminant 
analysis [Indahl et al. 1999], SIMCA [Wold & Dunn, III 1983] or some similar classifi- 
20 cation approach. However, for this feasibility study it suffices to show that discrimi- 
nation is possible to achieve with multivariate analysis of fluorescence landscapes. 

Example 3 

25 Analysis of colon cancer data 

Having a simple tool for detecting colon cancer is a very interesting application of 
the current invention. 

Materials and methods 

30 The data gathered here, include 77 samples (9 normal persons; 13 with benign tu- 
mor; 11 Dukes A; 14 Dukes B; 15 Dukes C; 15 Dukes D). For each person a blood 
sample was taken and blood plasma therefrom frozen. The blood plasma samples 
were defrosted and measured at room temperature. The samples were measured . 
undiluted front face in a 1 mm cuvette on a Perkin Elmer LS50B (Copenhagen Uni- 



WO 01/92859 



37 



PCT/DK01/00383 



versity). The excitation wavelength interval range was 230-400 nm (10 nm steps) 
and the excitation and emission slits were 4 nm and 3 nm, respectively. The scan 
rate was 1000 nm/min. In all, a total of 3834 variables were measured for each 
sample. At each excitation wavelength, the emission was removed in the range from 
5 250 to 22 nm above the excitation wavelength in order to remove Rayleigh scatter 
and other irrelevant phenomena [Bro 1998, Bro 1999]. Upon removal of these vari- 
ables, a total of 2020 variables were retained. 

Results 

For each class of samples, a principal component analysis (PCA) model is fitted to 
10 the fluorescence data. This is important for exploring the homogeneity of the group 
and for eliminating obvious erroneous samples. 

As an example, a PCA model of the data of persons with benign tumors is dis- 
cussed. In this group, there seems to be several individuals located distinctly iso- 

15 lated (4490, 4499, 8319, 4506) in the score plots (Fig 5). The remaining persons are 
situated in the same group in the score plot. The reasons for the behavior of the 
outlying samples can be related to the patients (extreme patients in some sense), to 
the sampling of the blood (extreme sampling in some sense) or to the actual meas- 
urements. For e.g. 4490 the person was later found to be incorrectly classified as 

20 benign. For 4506 the technician noted that the suspension was cloudy indicating 
incorrect treatment of the sample. For 831 9 the sample was noted to have precipi- 
tated matter, whereas for 4499 no reason was found for its behavior, besides the 
outlying behavior of the measured fluorescence data. In order to assure that the 
subsequent results are as robust as possible, the four samples were excluded; the 

25 three because of erroneous sample treatment or measurement and the fourth be- 
cause of assumed but unknown erroneous sample treatment or measurement. 

Similar outlying samples were found in other groups as well. E.g. for the group of 
Dukes A one sample had a very different fluorescence pattern and was excluded. 
30 For the Dukes B samples, four such samples were observed. For Dukes C only 
sample VB106 is moderately outlying. For Dukes D, sample VB76 is moderately 
outlying. All in all, 1 1 samples out of the 77 were excluded. An explanation of the 
erroneous sample treatment or measurement was found for most of these samples, 
which makes the decision to exclude these valid and reasonable. It must however, 



WO 01/92859 



PCT/DK01/00383 



38 



be borne in mind, that for five of the samples, there was no explanation found for the 
strange behavior. Hence, excluding these from the subsequent classification model 
is somewhat hazardous, because similar correct samples might be anticipated in 
real applications. Nevertheless, for the present feasibility study, the samples are 
5 considered as outliers of which the cause is presently unknown. 

In order to quantify how well these data can be used for screening for cancer, the 
data were split up into two groups: Persons without cancer (18) and persons with 
cancer (48). A cross-validation was performed in the following way. One person was 

1 0 left out in turn and subsequently a PCA model was fitted to each group of data. Thus 
two PCA models were built. For the non-cancer group, five components were used 
and for the cancer group seven components were used. The data from the left-out 
person was subsequently fitted to the two independently obtained models yielding 1) 
a set of score values for the sample and 2) a set of residuals of fluorescence varia- 

1 5 tion of that sample that the model could not explain. For one model, the score val- 
ues of the new sample, t (1xF) and the scores from the calibration data T (IxF) are 
used for calculating the T 2 statistic as T 2 = t(T T T)" 1 t.and the Q statistic as e T e where 
e (Jx1) is the vector of residual variation in the fluorescence data not explained by 
the model. The ratio of these values and the corresponding confidence limits ob- 

20 tained from the model are calculated (hence a value above one indicates that the 

sample is different). These two ratios are squared and summed and the square-root 
is used to test for class belongingness. If this number is less than the square-root of 
two, the sample is assigned to the class. If the sample is assigned to both classes, 
the one with the smallest number is the one chosen. 

25 

Using this approach the following classification result is obtained. 
Table 1 . Classification results 



Normal Cancer 



Total 



18 



48 



Correctly classified 
Incorrectly classified 



6 



12 



48 



0 



30 



It is observed that 82% are correctly classified and no false negatives are obtained. 
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Example 4 

Fluorescence measurements of urine from cardiac patients 

This example illustrates the treatment of outlying data and the ability of the 
5 invention to classify patients according to cardiac problems. 

Samples 

Eight urine samples were collected from seven men and one woman (post- 
menopause) who all were diagnosed with angina pectoris (samples #1-8). No 
other information was available from these patients. For comparison, urine 
10 samples were collected from five, arbitrarily chosen men (samples #9-1 3). 

Measurements 

Excitation-emission matrices were measured on the undiluted samples in a 
cuvette with 2 mm light path using front-face geometry on a Perkin-Elmer 
LS50B spectrofluorometer. In 28 consecutive scans, the excitation wave- 

15 length was shifted in 10 nm steps from 230 to 500 nm. Emission intensity 
was recorded starting 20 nm after the excitation wavelength until two times 
the excitation wavelength minus offset (or 900 nm). Thus, neither first nor 
second order Rayleigh scatter were recorded. Emission intensity was meas- 
ured in intervals of 0.5 nm. Spectral bandwidth on both monochromators 

20 was 5 nm. Scan-rate was 500 nm/min. 

In total, fluorescence intensity was measured at 17828 different combinations 
of excitation and emission wavelengths and the values exported to Matlab, 
version 5.2.1. 

25 Results 

Although the data have a three-way structure (samples x excitation wave- 
length x emission wavelength) this is disregarded and the data are rear- 
ranged to a two-way structure (samples x combination of excitation and 
emission wavelength) as illustrated in Figure 6. In this the 28 spectra, being 
30 arranged successively on an increasing wavelength scale, are displayed in 
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an overlay fashion. The thus obtained two-matrix is centered by subtracting 
from each column its average value. By means of Principal Component 
Analysis this centered matrix is modelled by three principal components ob- 
tained from a singular value decomposition of the centered matrix. 

5 

An initial analysis reveals that patient #5 is quite extreme as compared to the re- 
maining patients. This is illustrated in Figure 7 in a so-called influence plot. In a two- 
component model patient five has a very large residual variation (upper left corner) 
whereas in a three-component model patient five has an extremely high leverage 

1 0 (lower right corner). This result shows that after two components most data are well 
described except for patient five. Consequently, in the third component the fifth pa- 
tient gets a high leverage,, which means that this patient is determining this compo- 
nent. This is a typical example of an extreme outlier. If the cause for the extreme 
behaviour is instrumental, the patient's data must be excluded as an incorrect 

15 measurement. If the cause is biological diversity, this diversity must be better repre- 
sented in the data by incorporating more similar samples. As there are no further 
data in this specific investigation, the only suitable procedure is to exclude this sam- 
ple. 

20 Indirectly, the appearance of the outlying sample is an important illustration of 
one of the very important benefits of using exploratory data-analysis and 
having many physical parameters at disposal. Had it not been possible to 
detect the outlying sample, conclusions from the analysis could have been 
misleading. The model would be reflecting the difference between the sam- 

25 pies as such and the extreme sample five, rather than explaining the inter- 
differences and patterns between all samples. The availability and use of this 
evaluating tool during the model-building step shows that quality-deteriorating 
samples can be excluded, thus leading to improved models with improved 
validity. 



Refitting the principal component model without sample five, a score-plot is 
obtained as shown in Figure 8. Only the first two score vectors, PC1 and 
PC2, are displayed. The most important latent variables, PC1 and PC2 rep- 
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resent 97% of the original variation in the fluorescence data obtained in this 
investigation. The samples separate into two distinct clusters: Those below 
the dashed line in the lower left comer all represent samples from persons 
diagnosed with angina pectoris while the samples in the upper right corner all 
5 represent persons that are not diagnosed angina pectoris. Thus, it is clearly 
possible to separate diseased from healthy persons based on the fluores- 
cence data alone. 

It is indeed a significant finding, that the fluorescence data so clearly sepa- 
10 rate the two groups. Importantly, it is not merely the intensity of fluorescence 
that separates the patients. Differences in intensities are normally reflected in 
the first principal component of spectral data. In this case, however, the sec- 
ond component is also important for obtaining separation. In fact, the third 
component - not shown - also helps in obtaining further separation. This re- 
15 suit indicates, that more subtle spectral components can be increasingly 
helpful in the discrimination between the groups. 

As a larger data set becomes available the procedure outlined here will easily 
be formalised into a classification model that can identify persons with car- 
20 diac problems within a population of otherwise healthy individuals. 

Example 5 

Fluorescence measurements of urine samples with added bacteria 

25 

The purpose of the example was to investigate if fluorescence spectra measured 
directly on urine samples correlate with different added levels of bacteria in the 
urine. 

Samples 

30 Seven urine samples spiked with 10 2 to 10 8 E. coli bacteria pr. ml and a control 
sample with no bacteria added. The eight samples were delivered by Alice Friis- 
M0ller, Hvidovre Hospital and kept in a freezer until measurement. 
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Measurements 

The samples were measured at front face at room temperature in a 2 mm cuvette on 
a Perkin Elmer LS50B (Copenhagen University). The excitation wavelength interval 
range was 230-400 nm (10 nm steps) and the excitation and emission slits were 4 
5 nm and 3 nm, respectively. The scan rate was 1000 nm/min. The data were im- 
ported to Matlab using every 5 th emission wavelength giving a step of 2.5 nm in the 
emission scans. An important note is that the samples were measured in a se- 
quence corresponding to the increase in bacteria content. 

Results 

10 Raw data 

The raw unfolded data are averaged with a factor 10 over the variables so the matrix 
dimensions become 8 samples x 384 variables. This corresponds to circa 25 nm 
steps in the emission scans. Fig. 9. 
PCA 

15 A PCA is performed on both the mean centered and the auto scaled unfolded spec- 
tral data. Variables with standard deviation of 0 and variables including missing val- 
ues (NaNs) are excluded. 

In Figure 10 and 1 1 the score plots from these models are shown. No large differ- 
ences are seen between the auto scaled and the mean centered models. 

20 

PC1 clearly reflects the increase in bacteria content. 
PLS models 

Mean centered PLS models with fluorescence spectra as the independent variables 
and the logarithm of the bacteria content (10 2 to 10 8 ) as the dependent variable are 
25 developed. 

It is observed that there is a strongly non-linear relationship between the spectra 
and the number of bacteria cells, but the function is monotone. (Fig. 12) 

30 Conclusions 

There seems to be a non-linear relationship between the spectra and the number of 
bacteria cells. This can be corrected for by some non-linear transformation of e.g. 
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the dependent variable. Furthermore, it is important to note that the measurement 
order is crucial and should be randomised in a follow up study. 



Example 6 

Basic fluorescence measurements of blood plasma 



The purpose of the example was to investigate the effect of dilution and pH on the 
fluorescence spectra measured on blood plasma. Both transmission and front face 
sample presentations were investigated. 

Samples 

A 10 ml pool of blood plasma samples from Hvidovre Hospital was produced. Sam- 
ples were taken from this pool and diluted with 0.9% sterile NaCI, and the following 
dilutions were performed for front face: 



15 1 


2 


3.5 ml pool 


+ 


3.5 ml NaCI (A) 


1 


5 


2.0 ml A 


+ 


3.0 ml NaCI (B) 


1 


10 


1.0 mlA 




4.0 ml NaCI (C) 


1 


25 


2.0 ml C 


+ 


3.0 ml NaCI 


1 


50 


0.5 ml B 


+ 


4.5 ml NaCI 


20 1 


1.00 


1.0 ml pool 


+ 99.0ml Nc 


1 


200 


2.5 ml F + 


2.5 ml NaCI 


1 


500 


1.0 ml F + 


4.0 ml NaCI 


1 


700 


1.0 ml F + 


6.0 ml NaCI 


1 


2000 


0.25ml F 


+ 


4.75ml NaCI 


25 1 


3000 


0.20ml F 


+ 


5.8 ml NaCI 


1 


5000 


0.10ml F 


+ 


4.9 ml NaCI 



and for transmission: 



2 


2.8 ml pool 


+ 


2.8 ml NaCI (A) 


5 


2.0 ml A 


+ 


3.0 ml NaCI (B) 


10 


1.0 ml A 


+ 


4.0 ml NaCI (C) 


25 


2.0 ml C 


+ 


3.0 ml NaCI 


50 


0.5 ml B 


+ 


4.5 ml NaCI 



100 
200 



0.25 ml pool + 24.75ml NaCI (F) 

2.5 ml F + 2.5 ml NaCI 



WO 01/92859 



44 



PCT/DK01/00383 



-500 


1.0 ml F + 


4.0 ml NaCI 


700 


1.0 mIF + 


6.0 ml NaCI 


1000 


0.3 ml F 


+ 


2.7 ml NaCI 


2000 


0.25ml F 


+ 


4.75ml NaCI 


3000 


0.20ml F 


+ 


5.8 ml NaCI 


5000 


0.10ml F 


+ 


4.9 ml NaCI 



1 
1 

1: 
1 
1 
1 

Buffers were produced as follows: 

pH circa 9: 0.1 M NaH 2 P0 4 (1.785g NaH 2 P0 4 to 100ml H 2 0). 
pH circa 4: 0.1 M Na 2 HP0 4 (1.382g Na 2 HP0 4 to 100ml H 2 0). 
pH circa 7: 1.379g NaH 2 P0 4 + 1.787g Na 2 HP0 4 to 100ml H 2 0. 
pH circa 1: 0.1 M HCI (0.81 ml concentrated HCi to 100ml H 2 0). 
and dilutions for the buffer experiment were: 
1 :2 2.0 ml pool + 2.0 ml buffer 

1:10 0.4 ml pool + 3.6 ml buffer (B) 

1:200 0.2mlB+ 3.8 ml buffer 

1:1000 0.1 ml B + 9.9 ml buffer 

All possible combinations of pH levels and dilutions were measured resulting in 16 
(4x4) spectral landscapes. Both front face and transmission were tested. 

Measurements 

The samples were defrosted and measured at room temperature. The samples were 
measured in a standard 10x10 mm cuvette on a Perkin Elmer LS50B. The excitation 
wavelength interval range was 230-400 nm (10 nm steps) and the excitation and 
emission slits were 4 nm and 3 nm, respectively. The scan rate was 1000 nm/min. 
The data were imported to Matlab using every 5 th emission wavelength giving a step 
of 2.5 nm in the emission scans. 

Results 

Experiment 1: Dilution of blood samples measured in front face mode 

Raw data 

In Figure 13 examples of front face fluorescence spectra of an undiluted sample and 
the same sample diluted 1:5000 are shown. Very different spectral signals both with 
respect to intensity and shape are obtained for the two samples. 
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At low excitation wavelengths the signal intensities at first increase with dilution 
(probably due to quenching) followed by a decrease in signal intensity by further 
dilution. At high excitation wavelengths a (almost linear) decrease in signal intensity 
is seen with dilution. This is illustrated in Figures 14 and 15. 

5 

PCA 

A PCA is performed on the mean centered unfolded spectral data. Variables with 
standard deviation of 0 and variables including missing values (NaNs) are excluded. 
Systematic score patterns are seen for up to 5 to 6 PCs and Figure 16 shows the 
10 scores for 1 to 4 PCs. Variance explained is 79.38%, 16.32%, 2.60%, 1 .45% and 
0.24% for the first 5 PCs, respectively. 

Experiment 2: Dilution of blood samples measured in transmission mode 

Raw data 

15 In Figure 17 examples of transmission fluorescence spectra of an undiluted sample 
and the same sample diluted 1:5000 are shown. Again, very different spectral sig- 
nals both with respect to intensity and shape are obtained for the two samples. 

At low excitation wavelengths the signal intensities at first increase with dilution 
20 (probably due to quenching) followed by a decrease in signal intensity by further 

dilution. At high excitation wavelengths a (almost linear) decrease in signal intensity 
is seen with dilution. This is illustrated in Figures 18 and 19. 

PCA 

25 A PCA is performed on the mean centered unfolded spectral data. Variables with 

standard deviation of 0 and variables including missing values (NaNs) are excluded. 
Systematic score patterns are seen for up to 5 to 6 PCs as for front face mode and 
Figure 20 shows the scores for 1 to 4 PCs. Variance explained is 85.43%, 1 1.33%, 
1.91%, 0.99% and 0.30% for the first 5 PCs, respectively. 

30 

PLS models 



Mean centered PLS models with fluorescence spectra as the independent variables 
and the dilution factor as the dependent variable are developed for both front face 
and transmission mode measurements. 
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It is observed that there is a linear relationship from dilution factor 1:25 to 1:5000 for 
front face and for 1:50 to 1:5000. It seems that the relationship is non-linear below 
these dilutions factors. PLS modelling was also tested with auto scaled data to give 
5 the high excitation wavelengths more influence. Equal results were obtained al- 
though the linear range of the models could be expanded to approx. 1:10. 
PLS modelling was also tested with log(dilution factor) and nice linear models over 
the whole range were obtained. Only the undiluted sample seemed to deviate a lit- 
tle. 

10 

Experiment 3: Effect of pH & dilution on the measured spectra 

PCA . 

PCA is performed on each of the data sets recorded in transmission and front face 
mode. A score plot from a PCA on all samples is shown in Figure 23. 

15 

No huge differences are seen with respect to pH levels, see also Figure 24. 
Conclusions 

20 It is important to measure at least two (or even better three or four) different dilutions 
of the blood plasma samples: the undiluted sample and the same sample diluted 1:2 
in front face and diluted 1:200/1:100 in transmission. Not surprisingly, front face in- 
tensities are higher for samples measured at no or low dilution factors, while the 
opposite holds for samples with high dilution factors. Note, that it is possible to 

25 measure in transmission mode on the undiluted sample. The tested pH levels do not 
seem have large effects on the measured spectral shapes or intensities. ' 
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1 . A method of training a classification system for characterising a biological 
sample with respect to at least one condition, comprising 

a) obtaining a biological sample from an animal, including a human, wherein 
said biological sample is selected from body fluids and/or tissue, wherein 
the tissue sample is not associated with said condition(s), 

b) obtaining characterisation information related to each biological sample, 

c) exposing the sample to excitation light within a predetermined range of 
wavelength, 

d) determining physical parameters) of light emitted from the sample, 

e) repeating step a) to d) until the physical parameters of all training samples 
have been determined, 

f) optionally performing a data handling of the obtained physical parameters 
obtaining data variables, 

g) optionally performing a multivariate data analysis of the data variables and 
optionally of characterisation information obtaining model parameters de- 
scribing the variation of the data variables, 



h) classifying the biological samples into at least two different classes corre- 
lated to the characterisation information, obtaining a trained classification 
system. 



2. The method according to claim 1, whereby step g) further comprises selection of 
latent variables being weighted averages of data variables. 
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3. The method according to claim 1, wherein the biological sample is selected from 
blood, serum, plasma, saliva, urine, milk, cerebrospinal fluid, tears, nasal se- 
crete, semen, bile, lymph, sweat and/or faeces. 

4. The method according to claim 1 , wherein the biological sample is a tissue sam- 
ple. 

5. The method according to claim 4, wherein the tissue sample is a biopsy of tissue 
selected from muscle, cutis, subcutis, kidney, brain, and liver or a sample of hair 
or nails. 



6. The method according to claim 3, wherein the biological sample is urine, milk, 
blood, plasma or serum. 

1 5 7. The method according to claim 1 , wherein the wavelength of the excitation light 
is in the range of from 100 nm to 1000 nm, such as from 100 to 800 nm. 

8. The method according to claim 7, wherein the wavelength of the excitation light 
is in the range of from 200 nm to 800 nm, such as from 200 nm to 600 nm. 



9. The method according to claim 1 , wherein the physical parameter determined is 
selected from fluorescence intensity, fluorescence lifetime, phosphorescence 
intensity, phosphorescence lifetime, polarisation, polarisation lifetime, anisot- 
ropy, anisotropy lifetime, phase-resolved emission, circularly polarised fluores- 

25 cence, fluorescence-detected circular dichroism, and any time dependence of 

the two last mentioned parameters. 

10. The method according to claim 1, wherein the spectral distribution of light emit- 
ted ranging from 200 nm to 800 nm is generated. 

30 

1 1 . The method according to claim 2, wherein the ratio of number of training sam- 
ples to the expected number of latent variables is at least 5:1 , preferably at least 
10:1. 
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12. The method according to claim 1, wherein the multivariate data analysis is se- 
lected from: Principal component analysis, principal component regression, fac- 
tor analysis, partial least squares, fuzzy clustering, artificial neural networks, 
parallel factor analysis, Tucker models, generalised rank annihilation method, 

5 locally weighted regression, ridge regression, total least squares, principal co- 

variates regression, Kohonen networks, linear or quadratic discriminant analysis, 
k-nearest neighbours based on rank-reduced distances, multilinear regression 
methods, soft independent modelling of class analogies, robustified versions of 
the above and/or obvious non-linear versions such as one obtained by allowing 
10 for interactions or crossproducts of variables, exponential transformations etc. 

13. The method according to claim 1 , wherein the data handling of step f) is selected 
from a one-way matrix of spectral information, a two-way matrix of spectral in- 
formation, a three-way matrix of spectral information, a four-way matrix of spec- 

1 5 tral information and, a five-way or higher order matrix of spectral information. 

14. The method according to claim 1, wherein other variable(s) is included in the 
multivariate analysis of step g). 

20 15. The method according to claim 14, wherein the other variable(s) is selected from 
a pH value of the sample, concentration of various electrolytes in the sample, 
concentration of any other relevant compound in the sample, temperature, 
chemical parameters or any other physical property of the sample. 

25 16. The method according to claim 1 , wherein other variable(s) related to the animal, 
including a human being, is included in the multivariate analysis of step g) 

17. The method according to claim 16, wherein the other variable(s) is selected from 
any parameter relating to the bodily or mental condition, hair colour, skin colour, 

30 age, sex, geographic origin, affiliation, hereditary background, stress level, 

medical diagnosis, subjective evaluations or clinical parameters. 

18, The method according to claim 1, wherein the sample is pre-treated before sub- 
jecting the sample to step c). 
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19. The method according to claim 18, wherein the pre-treatment comprises adjust- 
ment of pH of the sample to a predetermined value. 

20. The method according to claim 1, wherein a classification system for diagnostic 
purposes with relation to heart diseases is obtained. 

21. The method according to claim 1, wherein a classification system for diagnostic 
purposes with relation to abuse of medicine or narcotics is obtained. 

22. A diagnostic classification system comprising 

a) a sample domain for comprising a biological sample, 

b) light means for exposing the sample to excitation light in the sample domain, 

c) a detecting means recording the physical parameter(s) of light emitted from 
the sample, 

d) optionally computing, means for performing data handling of the physical pa- 
rameters, obtaining data variables, 

e) optionally processing means for providing model parameters from data vari- 
ables of the sample, 

f) at least one storage means for storing physical parameters and/or data vari- 
ables and/or model parameters of the biological sample, 

g) at least one storage means for storing physical parameters and/or data vari- 
ables and/or model parameters and characterisation information of a trained 
classification system, 

h) means for correlating physical parameters and/or data variables and/or 
model parameters from the sample with physical parameters and/or data 
variables and/or model parameters of the trained system, and 
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i) means for displaying the characterisation class(es) of a sample. 

23. The system according to claim 22, wherein the model parameters are latent 
variables being weighted averages of the data variables. 

5 

24. the system according to claim 22, wherein the biological sample is a liquid 
sample, such as a sample selected from blood, serum, saliva, milk, urine, cere- 
brospinal fluid, tears, nasal secrete, semen, bile, lymph, sweat and/or faeces. 

1 0 25. The system according to claim 22, wherein the biological sample is a tissue 
sample. 

26. The system according to claim 25, wherein the tissue sample is a biopsy of tis- 
sue selected from muscle, cutis, subcutis, kidney, brain, and liver or a sample of 

1 5 hair or nails. 

27. The system according to claim 24, wherein the biological sample is urine, milk, 
blood, plasma or serum. 

20 28. The system according to claim 22, wherein the light means is arranged to emit 
light having a wavelength in the range of from 100 nm to 1000 nm, such as from 
100 to 800 nm. 

29. The system according to claim 28, wherein the light means is arranged to emit 
25 light having a wavelength in the range of from 200 nm to 800 nm, such as from 

200 nm to 600 nm. 

30. The system according to claim 22, wherein the physical parameter determined is 
selected from fluorescence intensity, fluorescence lifetime, phosphorescence 

30 intensity, phosphorescence lifetime, polarisation, polarisation lifetime, anisot- 

ropy, anisotropy lifetime, phase-resolved emission, circularly polarised fluores- 
cence, fluorescence-detected circular dichroism, and any time dependence of 
the two last mentioned parameters. 
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31. The system according to claim 22, wherein the detecting means is selected from 
a photomultiplier, a scanning camera, for example a vidicon, a CCD camera, a 
CMOS, or a diode array. 



32. The system according to claim 22, being divided into at least a first unit and a 
second unit, wherein said first unit comprises the parts a) to at least c) of the 
system, and the second unit comprises the other parts. 

10 33. The system according to claim 22, further including means for measuring other 
variable(s) of the sample. 

34. The system according to claim 33, wherein the other variable(s) is selected from 
a pH value of the sample, concentration of various electrolytes in the sample, 

15 concentration of any other relevant compound in the sample, temperature, 

chemical parameters or any other physical property of the sample. 

35. The system according to claim 22, further including means for entering other 
variables. 

20 

36. The system according to claim 35, wherein the other variable(s) is selected from 
any parameter relating to the bodily or mental condition, hair colour, skin colour, 
age, sex, geographic origin, affiliation, hereditary background, stress level, 
medical diagnosis, subjective evaluations or clinical parameters. 

25 

37. The system according to claim 22, wherein the sample is pre-treated before 
subjecting the sample to step b). 

38. The system according to claim 37, wherein the pre-treatment comprises adjust- 
30 ment of pH of the sample to a predetermined value. 

39. The system according to claim 22, being a classification system for diagnostic 
purposes with relation to heart diseases. 

40. The system according to claim 22, being a classification system for diagnostic 
35 purposes with relation to abuse of medicine or narcotics. 
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41. A method for characterising a biological sample of an animal, including a human, 
comprising 

a) obtaining a biological sample from the animal or human, 

b) exposing the sample to excitation light, 

c) determining the physical parameter(s) of light emitted from the sample, 

d) optionally performing a data handling of the obtained physical parameters 
obtaining data variables, 

e) storing the physical parameters and/or data variables and/or model pa- 
rameters, 

f) optionally providing model parameters from data variables of the sample, 

g) obtaining physical parameters and/or data variables and/or model parame- 
ters from a trained classification system, 

h) correlating physical parameters and/or data variables and/or model pa- 
rameters from the sample with physical parameters and/or data variables 
and/or model parameters of the trained system, and 

i) displaying characterisation class(es) of the sample. 

42. The method according to claim 41, wherein the model parameters are latent 
variables being weighted averages of the data variables. 

43. The method according to claim 41, wherein the biological sample is selected 
from blood, serum, plasma, saliva, urine, cerebrospinal fluid, tears, nasal se- 
crete, semen, milk, bile, lymph, sweat and/or faeces. 
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44. The method according to claim 41, wherein the biological sample is a tissue 
sample. 

45. The method according to claim 41, wherein the tissue sample is a biopsy of tis- 
5 sue selected from muscle, cutis, subcutis, kidney, brain, and liver. 

46. The method according to claim 43, wherein the biological sample is urine, blood, 
milk, or serum. 

10 47. The method according to claim 41 , wherein the wavelength of the excitation light 
is in the range of from 100 nm to 1000 nm, such as from 100 to 800 nm. 

48. The method according to claim 41, wherein the wavelength of the excitation light 
is in the range of from 200 nm to 800 nm, such as from 200 nm to 600 nm. 

15 

49. The method according to claim 41, wherein the physical parameter determined 
is selected from fluorescence intensity, fluorescence lifetime, phosphorescence 
intensity, phosphorescence lifetime, polarisation, polarisation lifetime, anisot- 
ropy, anisotropy lifetime, phase-resolved emission, circularly polarised fluores- 

20 cence, fluorescence-detected circular dichroism, and any time dependence of 

the tho last mentioned parameters. 

50. The method according to claim 41, wherein the spectral distribution of light 
emitted ranging from 200 nm to 800 nm is generated 

25 

51 . The method according to claim 41, wherein the data handling of step d) is se- 
lected from a one-way matrix of spectral information, a two-way matrix of spec- 
tral information, a three-way matrix of spectral information, a four-way matrix of 
spectral information and, a five-way or higher order matrix of spectral informa- 

30 tion. 

52. The method according to claim 41, wherein other variable(s) is included as data 
variables. 
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53. The method according to claim 52, wherein the other variable(s) is selected from 
a pH value of the sample, concentration of various electrolytes in the sample, 
concentration of any other relevant compound in the sample, temperature, 
chemical parameters or any other physical property of the sample. 

5 

54. The method according to claim 41, wherein other variable(s) related to the ani- 
mal, including a human being, is included as data variables 

55. The method according to claim 54, wherein the other variable(s) is selected from 
1 0 any parameter relating to the bodily or mental condition, hair colour, skin colour, 

age, sex, geographic origin, affiliation, hereditary background, stress level, 
medical diagnosis, subjective evaluations or clinical parameters. 

56. The method according to claim 41 , wherein the sample is pre-treated before 
15 subjecting the sample to step b). 

57. The method according to claim 56, wherein the pre-treatment comprises adjust- 
ment of pH of the sample to a predetermined value. 

20 58. The method according to claim 41, wherein the trained classification system is a 
diagnostic heart disease classification system. 

59. The method according to claim 41, wherein the trained classification system is a 
diagnostic abuse classification system related to abuse of medicine or narcotics. 
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